Identification of new drug classification terms in textual resources

نویسندگان

Corinna Kolárik

Martin Hofmann-Apitius

Marc Zimmermann

Juliane Fluck

چکیده

UNLABELLED Knowledge about biological effects of small molecules helps in the understanding of biological processes and supports the development of new therapeutic agents. DrugBank is a high quality database providing such information about drugs that contains annotation of drug effects and classification of therapeutic effects. However, to broaden the scope of such a database in classifying and annotating drugs, systems for automatic extraction of classification terms and the corresponding annotation of drugs are needed. We have developed an approach for the identification of new terms used in unstructured text that provide information about drug properties. It is based on the identification and extraction of phrases corresponding to lexico-syntactic patterns--so-called Hearst patterns that contain drug names and directly related drug annotation terms. Such phrases could be identified with a high performance in DrugBank text (0.89 F-score) and in Medline abstracts (0.83 F-score). In comparison to DrugBank annotation terminology, a huge amount of new drug annotation terms could be found. The evaluation of terms extracted from Medline showed that 29-53% of them are new valid drug property terms. They could be assigned to existing and new drug property classes not provided by the DrugBank drug annotation. We come to the conclusion that our system can support database content update by providing additionally drug descriptions of pharmacological effects not yet found in databases like DrugBank. Moreover, we propose that automatic normalization of terms improves the annotation and the retrieval of relevant database entries. SUPPLEMENTARY INFORMATION Supplementary data are available at Bioinformatics online.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

Text Classification is an important research field in information retrieval and text mining. The main task in text classification is to assign text documents in predefined categories based on documents’ contents and labeled-training samples. Since word detection is a difficult and time consuming task in Persian language, Bayesian text classifier is an appropriate approach to deal with different...

متن کامل

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

متن کامل

Identification, classification, and evaluation of supply chain risks in concrete dams in terms of cost

Iran is one of the pioneers in dam construction, which dam construction considers one of the greatest industry in Iran. In addition, reducing water sources attract the attentions to dam construction. The process of dam construction, supply of materials, and related services need great financial sources, also dam construction is a time consuming process. Therefore, it is important paying attenti...

متن کامل

Linguagrid: a network of Linguistic and Semantic Services for the Italian Language

In order to handle the increasing amount of textual information today available on the web and exploit the knowledge latent in this mass of unstructured data, a wide variety of linguistic knowledge and resources (Language Identification, Morphological Analysis, Entity Extraction, etc.). is crucial. In the last decade LRaas (Language Resource as a Service) emerged as a novel paradigm for publish...

متن کامل

Textual Metadiscourse Resources in Research Articles*

This study was motivated by three factors, which also contribute to its significance for today’s academic writing. First, research articles are the significant means of communication between the writers all over the world. Second, persuasion and organization are crucial notions in academic writing where the authors have to consider the academic audiences and their needs. Third, some writers ar...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

Bioinformatics

دوره 23 13 شماره

صفحات -

تاریخ انتشار 2007

Identification of new drug classification terms in textual resources

نویسندگان

چکیده

منابع مشابه

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

Using Fuzzy LR Numbers in Bayesian Text Classifier for Classifying Persian Text Documents

Identification, classification, and evaluation of supply chain risks in concrete dams in terms of cost

Linguagrid: a network of Linguistic and Semantic Services for the Italian Language

Textual Metadiscourse Resources in Research Articles*

عنوان ژورنال:

اشتراک گذاری